Udacity SDC Nanodegree - Advanced Lane Lines Project

Project Information

The goals / steps of this project are the following:

  • Compute the camera calibration matrix and distortion coefficients given a set of chessboard images.
  • Apply the distortion correction to the raw image.
  • Use color transforms, gradients, etc., to create a thresholded binary image.
  • Apply a perspective transform to rectify binary image ("birds-eye view").
  • Detect lane pixels and fit to find lane boundary.
  • Determine curvature of the lane and vehicle position with respect to center.
  • Warp the detected lane boundaries back onto the original image.
  • Output visual display of the lane boundaries and numerical estimation of lane curvature and vehicle position.

The images for camera calibration are stored in the folder called camera_cal. The images in test_images are for testing your pipeline on single frames. The video called project_video.mp4 is the video your pipeline should work well on. challenge_video.mp4 is an extra (and optional) challenge for you if you want to test your pipeline.

If you're feeling ambitious (totally optional though), don't stop there! We encourage you to go out and take video of your own, calibrate your camera and show us how you would implement this project from scratch!

Some thoughts

I have spent much more time on this project than any previous ones. It has been a good practice for me to get familiar with many computer vision concepts and techniques, such as camera calibration and perspective transform. But to be frank, I am still not convinced that the algorithms used here are general and robust enough to be useful in practical self-driving, with the main reasons being:

  1. There are too many "ad-hoc" assumptions on the image, e.g., the brightness/color of lanes compared to other objects, the standard of roads (e,g., for meter-per-pixel estimate), and etc. As a consequence, there are a lot of possibilities to break these assumptions, e.g., driving from urban roads to high way in another country.
  2. The rules that encode the "lane-detection" knowledge are very complicated compared to other data-driven approach such as semantic segmentation. Usually this means the model has a bigger chance of being overfitted, e.g., when some parts of lanes are hidden. And this cannot be remedied by using more and more data! This makes the results of the method less reliable in practice.
  3. The outcome of the method, namely curvature and position are usually based on rough estimatio, and may not be accurate enough for a steering purpose.

A lot of literature on lane detection by traditional computer vision can be found in previous publications, but I think most of the focuses have been shifted toward more data-driven approaches recently. I hope there will be more advanced techniques covered in the later part of the course to address those shortcomings.

Code Organization

  • The code is wraped in package sdclane, with main functions implemented in the following files:
    • camera.py: camera calibration
    • line_detection.py: line detection by several methods such as sobel in different color spaces
    • transform.py: perspective transform to get bird-eye view of lanes
    • lane_detection.py: main class LaneDetector builds pipeline from previous steps to detect/estimate lanes in images and videos.
    • config.py: configuration such as images for camera calibration and testing
    • utility.py: helper functions for pipeline and image io
In [1]:
%matplotlib inline

from sdclane import config, utility, camera, line_detection, transform, lane_detection

import cv2
import matplotlib.pyplot as plt
import numpy as np

from moviepy.editor import VideoFileClip
from IPython.display import HTML

np.random.seed(1337)

Project Rubric walk-through

1. Camera Calibration

OpenCV functions or other methods were used to calculate the correct camera matrix and distortion coefficients using the calibration chessboard images provided in the repository. The distortion matrix should be used to un-distort the test calibration image provided as a demonstration that the calibration is correct.

Image undistortion is implemented in sdclane.camera.CameraCalibrator class.

  • its fit method uses a set of chessboard_images and estimates distortion matrix and coefficients as self.M and self.d
  • its undistort method takes a raw image and undistorts it accordingly
  • during the implementation, I found some images provided in camera_cal folder have different specifications e.g., # of row/col corners. Those are simply ignored. The results of undistortion will be shown below.

2. Pipeline for single images

In [2]:
## helper to display image grid
def grid_plot(image_cols):
    ncols = len(image_cols)
    nrows = len(image_cols[0][1])
    fig, axes = plt.subplots(nrows, ncols, figsize = (8*ncols, 4*nrows))
    fig.tight_layout()
    fig.subplots_adjust(wspace = 0.1, hspace=0.1, )

    for r, ax in enumerate(axes):
        for c, (colname, imgs) in enumerate(image_cols):
            img = imgs[r]
            cmap = plt.cm.gray if img.ndim < 3 else None
            ax[c].imshow(img, cmap=cmap)
            ax[c].set_axis_off()
            ax[c].set_title(colname)
In [3]:
## load the test images
test_imgs = utility.read_rgb_imgs(config.test_img_files)

2.1 Distortion Correction

Distortion correction that was calculated via camera calibration has been correctly applied to each image.

The results of distortion correction are shown below. In details, sdclane.camera.CameraCalibrator.fit() method reads a set of chessboard images from camera_cal/ folder (assuming this is from the same camera), where cv2.calibrateCamera is called to estimate the calibration matrix and coefficient. Later these esimated parameters are used in undistort() method.

Visually there is some right shift in the undistorted images.

In [4]:
undistort = camera.build_undistort_function()

undistorted_imgs = list(map(undistort, test_imgs))
grid_plot( [("original", test_imgs), 
           ("undistorted", undistorted_imgs)])
Warning: Cannot find corner points in image
Warning: Cannot find corner points in image

2.2 Line detection

At least two methods (i.e., color transforms, gradients) have been combined to create a binary image containing likely lane pixels. There is no "ground truth" here, just visual verification that the pixels identified as part of the lane lines are, in fact, part of the lines.

There are two steps implemented for line detection:

  • Detection of lines from undistorted images by a combination:
    • sobel of x on a gray image from HLS channel - detecting lines with horizontal gradients
    • two sobel of directions, $arctan({\frac{sobely}{sobelx}})$, from hls channel - as left and right lines within a certain angel ranges
    • combination of the above three - lines_with_gradx AND (left_line OR right_line).
    • I found the L and S channel of HLS images are specially good at detecting bright lines in spite of color changes and shadows.

This is implemented in sdclane.line_detection.LineDetector.detect()

  • A cropping within a trapezoidal ROI at the bottom.

The results of the pipe are shown below.

In [5]:
detect_line = line_detection.LineDetector().detect
roi_crop = transform.build_trapezoidal_bottom_roi_crop_function()

line_imgs = list(map(detect_line, undistorted_imgs))
roi_line_imgs = list(map(roi_crop, line_imgs))

grid_plot([("undistorted", undistorted_imgs),
          ("lines", line_imgs),
          ("lines in ROI", roi_line_imgs)])

2.3 Perspective transform

OpenCV function or other method has been used to correctly rectify each image to a "birds-eye view"

The perspective transform is implemented in sdclane.transform.PerspectiveTransformer class, in several steps:

  1. pick a training image where two lanes are roughly linear and parallel to each other. I picked test_imgs[3] as the reference by visual checking. This is customizable in sdclane.config package.
  2. detect the two lanes as the two legs of a trapzoid in the original space, and map them to a rectangle in the warped space.
    • Here I use kmeans to separate the pixels into left and right lanes, and use ransac model to estimate a robust model of lines for each.
    • the choice of the target rectangle is relatively arbitrary, as long as the estimate of meter-per-pixel later on is consistent.
  3. estimate the transform matrix and its inverse by cv2.getPerspectiveTransform on the trapzoid and rectangle.
  4. estimate the meter-per-pixel x_mpp and y_mpp so later you can use them to estimate other parameters such as curvatures and center offsets.
    • estimate of x_mpp is relatively straightforward. We assume the width of the lane is always 3 meters and x_mpp is the ratio of lane width w.r.t the width of the target rectangle in warped space.
    • I estimated y_mpp in a slightly different way from in the clas - I chose the longest segment of the dotted lane and assumed it to be 3 meters in reality (as suggested in the class). This gives different curvature and offset estimate later on, but I am not really sure which is more (or less) accurate because the method used in the class is also quite ad hoc.
  5. after esimating the transform matrix, you can transform any new image (original RGB image or simply its binary line image) to a bird-eye view. These are implemented as PerspectiveTransformer.transform() and PerspectiveTransformer.binary_transform().

The results of the transform are shown below. Visually the lanes in the bird-eye view are clear and roughly parallel.

In [6]:
transformer = transform.build_default_warp_transformer()

warped_imgs = list(map(transformer.transform, test_imgs))
grid_plot([("undistorted", undistorted_imgs),
          ("birdeye view", warped_imgs)])
Warning: Cannot find corner points in image
Warning: Cannot find corner points in image

2.4 Lane fitting

Methods have been used to identify lane line pixels in the rectified binary image. The left and right line have been identified and fit with a curved functional form (e.g., spine or polynomial).

The same techniques can be used to detect the lane pixels in the warped images. However, I choose to directly transform the line pixels from original image space to the bird-eye view space, by using PerspectiveTransformer.binary_transform(). This is based on the observations that (1) the lane detection in original images are already visually good (2) in the bird-eye view the lanes are still clear, and (3) it makes the code simpler.

The results of the lane pixels in the bird-eye view are shown below. We can see there are some noises in the final lane images, which need to be removed before parameter estimation.

In [7]:
lane_imgs = list(map(transformer.binary_transform, roi_line_imgs))
grid_plot([
    ("undistorted", undistorted_imgs),
    ("lanes in original space", roi_line_imgs),
    ("lanes in bird-eye view", lane_imgs)
])

2.5 Lane parameter estimation

Here the idea is to take the measurements of where the lane lines are and estimate how much the road is curving and where the vehicle is located with respect to the center of the lane. The radius of curvature may be given in meters assuming the curve of the road follows a circle and the position of the vehicle within the lane may be given as meters off of center.

Now we have the lane pixels in bird-eye view and meter-per-pixel for both x and y, estimating the curvature and the center offset is straightforward. The whole process is implemented in sdclane.lane_detection.LaneDetector.detect_image():

  1. noise processing - remove small holes and objects in the lane image by morphology operations. This is implemented in LaneDetector.get_lane_pixels().
  2. divide the pixels into left and right lanes by sliding a window vertically from top down, and separating left and right as pixel groups that are apart from each other. This is implemented in LaneDetector.get_lane_pixels().
  3. After getting pixels for each lane, a 2nd order polynomial is fit for each lane, based on which the radius of curvature and center offset are caculated in LaneDetector.estimate_lane_params(). To calculate these parameter values in reality, the previously estimated meter-per-pixel x_mpp and y_mpp from transform are used.
  4. The parameters for two lanes are used to validate the performance of the lane detection, this is important when later it is used to detect videos, where either fast-tracking or search-from-beginning can be used based on whether the detection result is good enough.
  5. The estimated 2nd polynoimal approximation of lanes, together with their middle curves, are overlayed onto the image for furthure visual check.

2.6 visual validation

The fit from the rectified image has been warped back onto the original image and plotted to identify the lane boundaries. This should demonstrate that the lane boundaries were correctly identified.

The whole pipeline from a camera image, to undistorted, to lane detection, to bird-eye view, and finally parameter estimation and visual check, is implemented in the sdclane.lane_detection.LaneDetector.test_image() method.

The final results of the pipeline are depicted below.

  • two lanes and their middle curves are colored in red, even though the left lane might be hidden by the yellow color of real lane.
  • visually the polynomial fitting of lanes are quite aligned with the real ones on the images.
  • however, the curvatures and center offset depends on meter-per-pixel values as mentioned perviously. These are still quite subjective and unaccurate in my opinion. So the values might be different from that are given in the class - This is the part that I am not really convinced with.
In [8]:
# build lane detector
lane_detector = lane_detection.LaneDetector()

lane_estimates = [lane_detector.detect_image(img)[1] for img in test_imgs]

grid_plot([
    ("camera images", test_imgs),
    ("lane estimates", lane_estimates)
])
Warning: Cannot find corner points in image
Warning: Cannot find corner points in image
Warning: Cannot find corner points in image
Warning: Cannot find corner points in image

3. Pipeline for video

3.1 Lane detection

The image processing pipeline that was established to find the lane lines in images successfully processes the video. The output here should be a new video where the lanes are identified in every frame, and outputs are generated regarding the radius of curvature of the lane and vehicle position within the lane. The identification and estimation don't need to be perfect, but they should not be wildly off in any case. The pipeline should correctly map out curved lines and not fail when shadows or pavement color changes are present.

The same lane detection pipeline for videos are also implemented in sdclane.lane_detection.LaneDetector class, under the LaneDetector.detect_video() method. The result will be shown below.

3.2 Lane search for first frame

In the first few frames of video, the algorithm should perform a search without prior assumptions about where the lines are (i.e., no hard coded values to start with). Once a high-confidence detection is achieved, that positional knowledge may be used in future iterations as a starting point to find the lines.

LaneDetector.detect_video() method is implemented in a way that

  1. it always detects lanes by performing a full search (as in test_image() method) if there is no estimates from the previous frame available.
  2. if an estimate from previous frame is available, it will try to use a faster search method by looking in a small neighborhood of the last detection, assuming the positions of new lanes will not be too different from the last ones. This is implemented in LaneDetector.process_frame() method.
  3. however, if the detection result from step 2 is not good enough (based on whether the two lanes are roughly parallel in their linear parts), it will go back to using a full search for the next frame.
  4. if both fast search and full search fail for some frames, it simply returns as no lanes for the current frame.

3.3 Tracking of different frames

As soon as a high confidence detection of the lane lines has been achieved, that information should be propagated to the detection step for the next frame of the video, both as a means of saving time on detection and in order to reject outliers (anomalous detections).

In details, the "faster search" based on tracking of last frame works as follows,

  • generate sample points from lane models of last detection
  • find lane pixels for the current frame within the neighbors of the generated samples, as a sliding window.
  • estimate the current lane parameters based on these detected lane pixels.

This is implemented in LaneDetector.process_frame(). There are many heuristic-based parameters that hard-code the detection algorithm, such as sliding window size and etc. I am not confident at all whether it will work on new scenarios.

The result on project_video.mp4 is shown below. The algorithm works partially on the two challenge vidoes, when certain assumptions made in the code are met in the videos. However I didn't go further to modify the code to work on these challenges. As mentioned above, I am not really convinced by the material in this project, so even it succeeds on the challenge videos, I have no confidence at all that it will work on new scenarios.

In [9]:
clip_output_file = 'marked_project_video.mp4'
clip = VideoFileClip("project_video.mp4")
clip_output = lane_detector.detect_video(clip)
%time clip_output.write_videofile(clip_output_file, audio=False)
[MoviePy] >>>> Building video marked_project_video.mp4
[MoviePy] Writing video marked_project_video.mp4
100%|█████████▉| 1260/1261 [09:05<00:00,  2.15it/s]
[MoviePy] Done.
[MoviePy] >>>> Video ready: marked_project_video.mp4 

CPU times: user 1h 6min 8s, sys: 2min 12s, total: 1h 8min 20s
Wall time: 9min 6s
In [10]:
HTML("""
<video width="960" height="540" controls>
  <source src="{0}">
</video>
""".format('marked_project_video.mp4'))
Out[10]:

4. Conclusion

The Readme file submitted with this project includes a detailed description of what steps were taken to achieve the result, what techniques were used to arrive at a successful result, what could be improved about their algorithm/pipeline, and what hypothetical cases would cause their pipeline to fail.

I have explained the general steps in the pipeline above. More details can be found in the reference to the code. Comments in codes are given across the files when necessary.

I have also shared my thoughts on why the method used here may not be satisfying in practice. And practically, my implementation is not really fast enough for real-time lane detections.

The detection of lanes are quite smooth across successive frames because a smoothing method has been implemented to take the moving averages of estimations. However the middle curve is still a little bumpy - modifying the weight coefficient might help in this case, but again it is another parameter that is based on heuristic and may not be general enough for new cases.

With these said, I have learned a lot of knowledge about computer visions through the project, on that I think I have achieved the goal!

In [ ]: